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TITLE OF THE INVENTION 

A MULTI-WORD ARITHMETIC DEVICE FOR FASTER COMPUTATION OF 
CRYPTOSYSTEM CALCULATIONS 

This application is based on an application No. 11-099657 
5 filed in Japan, the content of which is hereby incorporated by 
reference . 

BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to a device for executing modular 
10 arithmetic on multi-word (multiple-precision) integers and in 

particular to a device for executing two or more types of 

modular arithmetic. 

Background Art 

Many encryption systems use calculations performed on 
15 multi-word integers in a f inite " field. Here, a multi-word 

integer is an integer with a word-length exceeding that of the 

32-bit word-length customarily used in a conventional CPU: for 

example 160 bits. If such a cryptosystem is to be implemented 

by a communication device or similar, an arithmetic unit 
20 capable of performing multi-word arithmetic at high-speed is 

required. 

An arithmetic unit for performing encryption according to 
the RSA (Rivest, Shamir, Adleman) public-key cryptosystem is 



conventionally realized by manufacturing a specialized LSI 
formed from a multiplier and memory. Such an arithmetic unit 
is only capable of performing exponential modular arithmetic 
on multi-word integers. This computation is performed by 
5 repeatedly using a multiplier with a short bit-length. The 
arithmetic unit is used in combination with the CPU as a 
coprocessor. 

One public key cryptosystem that has recently been 
gathering ground as an alternative to RSA encryption is 
10 elliptic curve cryptology (ECC) . ECC is secure against 

attacks, such as index calculus, that are effective against 
RSA encryption, and uses key data with a much shorter word- 
length than that used in RSA encryption, while still 
: preserving sufficiently high security. For example, the same 
15- level of security provided by a 102 4 -bit key in RSA encryption 
can be achieved in ECC with only a 160-bit key. 

However, achieving such high security ECC requires a 
variety of other computations in addition to the exponential 
modular arithmetic necessary for RSA encryption. These 
20 include the four basic arithmetic operations, and computation 
performed using complex processing which is predetermined but 
includes conditional branches. 

As a result, when ECC computation is performed using the 



above-mentioned specialized RSA encryption coprocessor, only a 
very limited number of calculations can be executed. In other 
words, most of the computation is performed by the CPU, so 
that overhead resulting from exchanges of control signals 

5 between the CPU and the coprocessor increases, thereby 
preventing high-speed processing from being realized. 

On the other hand, if a software-based method in which the 
CPU executes all of the types of calculation necessary for ECC 
is used, the use of multi-word computation data requires the 

l5=! CPU to access the memory at an extremely high frequency. As a 
result, data cannot be supplied efficiently to the arithmetic 
unit in the CPU, preventing the realization of high-speed 
processing. 
SUMMARY 

13- One object of this invention is to provide a multi-word 

arithmetic device capable of high-speed execution of the 
various types of multi-word arithmetic required for elliptic 
curve cryptology and the like. 

A further object is to provide a multi-word arithmetic 

20 device capable of executing, using a small-scale circuit, an 
operation selected from a plurality of types of multi-word 
arithmetic . 

As is clear from the above explanation, the multi-word 



arithmetic device in the present invention executes modular 
arithmetic on multi-word integers, in accordance with 
instructions from an external device and includes the 
following: a memory, an arithmetic unit, a memory input/output 
circuit and a control circuit. The arithmetic unit executes, 
on word units, at least two types of calculation, including 
addition and multiplication, and outputs a one-word 
calculation result. The memory input/output circuit performs 
(1) a first data transfer for storing in the memory at least 
one integer received from an external device, (2) a second 
data transfer for inputting at least one integer stored in the 
memory into the arithmetic unit in word units, (3) a third 
data transfer for storing in the memory the calculation result 
output from the arithmetic unit, and (4) a fourth data 
transfer for outputting the calculation result from the memory 
to the external device. The control circuit, according to 
instructions received from the external device, (a) specifies, 
to the memory input/output unit, data to be transferred by the 
second and third data transfers, and (b) specifies, to the 
arithmetic unit, a type of calculation to be executed, thereby 
controlling (i) the arithmetic unit to selectively perform one 
of at least two types of modular arithmetic on the at least 
one integer stored in the memory; and (ii) the memory 



input/output circuit to store the calculation result of the 
modular arithmetic into the memory. 

In this construction, a multi-word arithmetic device, 
having received instructions from an external device such as a 
CPU, acts independently of the external device to selectively 
execute one of two or more types of modular arithmetic 
required in elliptic curve cryptology. As a result the multi- 
word arithmetic device can be used as a coprocessor, thereby 
enabling high-speed multi-word arithmetic to be realized. 

In addition, the multi-word arithmetic device performs 
multi-word arithmetic by repeatedly using an arithmetic unit 
operating in word units, in place of a long-word arithmetic 
unit. This means that the multi-word arithmetic device can be 
realized by a small-scale circuit. 

Furthermore, the actual content of operations performed by 
the arithmetic unit and the memory input/output unit is not 
fixed, but is determined by a control circuit which receives 
instructions from the external device. As a result, 
controlling the number of times that the arithmetic unit is 
used and the like enables a flexible multi-word arithmetic 
unit capable of executing modular arithmetic at a variety of 
different security levels, i.e. on integers having a variety 
of word-lengths, to be realized without altering any hardware. 



Here, at least two integers are stored in the memory, and 
the arithmetic unit includes an adder for adding at least two 
pieces of one-word data; and a multiplier for multiplying at 
least two pieces of one-word data. The memory input/output 
circuit simultaneously reads one word from each of the at 
least two integers stored in the memory, and outputs the read 
words to one of the adder and the multiplier. 

This construction enables two pieces of data on which 
calculation is to be performed to be input simultaneously into 
the arithmetic unit, so that processing can be performed 
faster than would be the case if such data were input 
sequentially. 

Here, the memory is divided into two dual-port memories, 
each allowing access to two storage areas designated by two 
addresses, and allowing (1) two read operations, or (2) one 
read operation and one write operation to be performed 
simultaneously on word units. The at least two integers are 
stored in each dual-port memory so that the memory 
input/output circuit can simultaneously (1) read a piece of 
one-word data simultaneously from each of the integers stored 
in the two dual-port memories, and have the read pieces of 
data input into one of the adder and the multiplier, and (2) 
write a piece of one-word data output from one of the adder 



and the multiplier into one of the two dual-port memories. 
This construction enables input of data from the memory to the 
arithmetic unit to be performed simultaneously with output of 
data from the arithmetic unit to the memory. As a result, 
overhead generated when data transfer is performed can be kept 
to a minimum. In other words, input and output to and from 
the memory is performed repeatedly in word units without any 
pauses, enabling high-speed processing to be performed. 

The arithmetic unit, according to instructions from the 
control circuit, executes one of the following three 
calculations: (1) addition of at least two pieces of one-word 
data; (2) multiplication of two pieces of one-word data; and 
(3) multiplication of two pieces of one-word data and 
accumulation of multiplication results. The arithmetic unit 
includes a multiplier receiving an input of two pieces of one- 
word data and outputting a piece of two-word data, an adder 
receiving an input of at least two pieces of two-word data, 
including a piece of two-word data output from the multiplier, 
and outputting a piece of multi-word data, and a selecting 
circuit selecting, according to instructions from the control 
circuit (1) data to be input into one of the multiplier and 
the adder out of data transmitted from the memory input/output 
circuit; and (2) data to be output as the calculation result 

7 



out of data output from one of the adder and the multiplier. 

In this construction, the arithmetic unit performs one of 
three types of calculation according to a specification from 
the control circuit, despite being equipped with only one 
5 adder and one multiplier. This enables a multi-word 

arithmetic device capable of executing a wide variety of types 
of modular arithmetic to be realized with only a small-scale 
circuit. 

G Here, the at least two types of modular arithmetic include 

10'=- modular addition. On receiving, from the external device, an 
instruction to execute modular addition and an indication of a 
^=3 number of words n for each integer on which modular addition 
=;' is to be performed, the control circuit controls the memory 
input/output circuit and the arithmetic unit to execute the 
15; following processing. (1) The memory input /output circuit 

obtains from the external device and stores in the memory two 
n-word integers A and B on which modular addition is to be 
executed and a n-word integer P showing a modulus. Then, (2) 
the memory input /output circuit (a) reads simultaneously, from 
20 the integers A, B and P stored in the memory, pieces of one- 
word data a, b and p, each with a same digit position, and has 
the read pieces of data input into the arithmetic unit, while 
(b) storing in the memory a piece of one-word data w output 



from the arithmetic unit, and repeats processes (a) and (b) 
sequentially from a lowest-order word in each integer until n 
words of data are obtained, enabling an n-word integer W to be 
stored in the memory. (3) The arithmetic unit repeats n times 
5 a process in which the pieces of data a, b and p received from 
the memory input/output circuit are computed as a + b - p, 
propagating a carry, and a result w is output. In this 
construction, the multi-word arithmetic unit speculatively 
executes a modular addition A+B-P so that when A and B are 
such that P<A+B<2P, the modular addition of integer A and 
integer B can be completed by using only the processing in (1) 
to (3) above. 

In addition, the control circuit determines whether a carry 
has been generated by the arithmetic unit immediately after 

15 completion of the processing (1) to (3) above, and if a carry 
has been generated, further controls the memory input/output 
circuit and the adder to execute the following processing. 
(4) The memory input /output circuit (a) reads simultaneously, 
from the integers W and P stored in the memory, pieces of one- 

20 word data w and p, each with a same digit position, and has 

the read pieces of data input into the arithmetic unit, while 
(b) storing in the memory a piece of one-word data c output 
from the arithmetic unit and repeats processes (a) and (b) 



sequentially from a lowest-order word in each integer until n 
words of data are obtained, enabling an n-word integer C to be 
stored in the memory. Then, (5) the arithmetic unit repeats n 
times a process in which the pieces of data w and p received 
from the memory input/output circuit are computed as w + 
propagating a carry, and a result c is output. This 
construction enables adjustment (recovery of mod P) to be 
performed when the result of A+B in the processing of (1) to 
(3) is negative. 

Furthermore, the at least two types of modular arithmetic 
include Montgomery reduction calculating a residue for A-R^(-l) 
mod P, when each word has k bits, A is a 2n-word integer used 
for input data, R is an integer 2^ {k^n) and P is an n-word 
integer. Upon receiving, from the external device, an 
instruction to execute Montgomery reduction and an indication 
of a number of words 2n for an integer A on which Montgomery 
reduction is to be performed, the control circuit controls the 
memory input/output circuit and the arithmetic unit to execute 
Montgomery reduction. This construction realizes a multi-word 
arithmetic device executing Montgomery reduction, which is 
modular arithmetic based on a high-speed processing algorithm. 

Furthermore, when receiving an instruction to execute 
Montgomery reduction from the external device, the control 

10 



circuit controls the memory input/output circuit and the 
arithmetic unit so as to execute the following processing. (1) 
the memory input/output circuit acquires integers A, P and V 
from the external device and stores the obtained integers in 

5 the memory, the integer V being -P^(-l) mod R. (2) The 

arithmetic unit computes partial products for words from each 
of (i) a lower n words of the integer A stored in the memory, 
and (ii) the integer V, and accumulates words in partial 
products having a same digit position, repeating the process 

10 sequentially from a lowest word in each integer until n words 
of accumulated results are obtained, and storing the 
accumulated results in the memory as a piece of ri-word 
intermediate data B. (3) The arithmetic unit computes partial 
products for words from each of (a) the piece of intermediate 

li3 data B and (b) the integer P stored in the memory, and 

accumulates words in the partial products having a same digit 
position so that, when a lowest word is a 0th word, 
accumulated results for a 0th to (n-3)th word are not 
obtained, but accumulated results for a (n-2)th word to a (2n- 

20 l)th word are obtained and stored in the memory as the upper 
words of a piece of intermediate data D. (4) The 
arithmetic unit (a) generates (i) a carry obtained from a one- 
word addition performed by adding a lowest word from each of 



the piece of intermediate data D and an integer AA, and (ii) a 
one-bit logical value, the integer AA being an upper (n+1) 
words of the integer A, and the one-bit logical value being 0 
when a one-word addition result is 0, and 1 when the one-word 
5 addition result is not 0. The arithmetic unit then (b) adds 

an upper n words of the piece of intermediate data D, an upper 
n words of the integer AA, the carry and the one-bit logical 
value, by repeating addition of word units sequentially from a 
lowest word in each integer, while propagating a carry, until 
10 n words of data are obtained, and stores an addition result in 
the memory as a piece of ri-word output data M. (5) When the 
output data M stored, in the memory is at least as large as the 
integer P, the arithmetic unit subtracts the integer P from 
the output data M until the output data M is 0 or a positive 
L5 integer smaller than the integer P, by repeating subtraction 

of word units sequentially from a lowest word in each integer, 
while propagating a carry, until n words of data are obtained, 
and stores the subtraction results in the memory as a new 
piece of n-word output data M. 
20 The multiplication in processes (2) and (3) of this 

construction is performed by computing and accumulating only 
required partial products, rather than computing all possible 
partial product combinations. This enables multiplication 

12 



processing to be shortened. 

Furthermore, in processing (4), the arithmetic unit adds a 
piece of one-word data containing all ones to the piece of 
intermediate data D and the integer AA, and stores an upper n 

5 words of an obtained addition result in the memory as the 
output data M. The addition of the four pieces of data in 
processing (4) can be replaced in this construction with 
addition of three pieces of data, thereby allowing, for 
example, calculation that would have been performed on two 

10 separate occasions by the three-input adder to be performed on 
one occasion. 

Furthermore, in processing (2) and (3), the arithmetic unit 
selects sets of word pairs, each set formed from all the pairs 
of words that generate a partial product with a same digit 

m position, sets input values in the multiplier, and computes 

and accumulates the partial products for the selected pairs of 
words in sequence from the set with a lowest digit position. 
In this construction, the computation and accumulation of 
partial products is executed in an efficient order, so that 

20 pipeline irregularities are unlikely to be generated. 

Furthermore, in processing (2) and (3), the arithmetic unit 
stores in the memory as part of a multiplication result a 
lower word from a two-word accumulated result obtained by 



accumulating partial products with the same digit position, 
and adds an upper word from the accumulated result to partial 
products that have a digit position one word higher and are 
thus the next to be calculated. The arithmetic unit also 
performs an operation for storing a lower word from the 
accumulated result in the memory, simultaneously with an 
operation for adding an upper word from the accumulated result 
to partial products that have a digit position one word higher 
and are thus the next to be calculated. 

In this construction, accumulation of partial products is 
performed simultaneously with processing for propagating an 
upper word of the accumulation result to a partial product 
having a higher order digit (higher digit position) . This 
means that accumulation for all of the partial products can be 
performed at high speed. 

Furthermore, when computing and accumulating partial 
products in processing (2) and (3), the arithmetic unit 
updates accumulated values by (a) simultaneously (i) computing 
a partial product and (ii) reading a previously accumulated 
one-word value from the memory, (b) adding the accumulated 
one-word value to a corresponding word in the partial product, 
and (c) storing a result of the addition in a corresponding 
area of the memory. 



This construction enables selection of the pairs of data to 
be multiplied to be performed with greater flexibility. 



BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects, advantages and features of the 
invention will become apparent from the following description 
thereof taken in conjunction with the accompanying drawings 
which illustrate a specific embodiment of the invention. In 
the drawings : 

Fig. 1 is a block diagram showing a circuit construction 
for a multi-word arithmetic device in the invention; 

Fig. 2 is a circuit diagram showing a detailed construction 
of an arithmetic unit in the multi-word arithmetic device; 

Fig. 3 is a circuit diagram showing a detailed construction 
of a memory input/output unit in the multi-word arithmetic 
device; 

Fig. 4 is a flowchart showing an overall operating 
procedure for the multi-word arithmetic device; 

Fig. 5 shows a calculation formula for modular addition 
performed by the multi-word arithmetic device and examples of 
input data obtained by the multi-word arithmetic device from 
an external device; 

Figs. 6A and 6B show a memory map of a memory when modular 
15 



addition is performed by the multi-word arithmetic device; 

Fig. 7 is a flowchart showing an operating procedure when 
modular addition is performed by the multi-word arithmetic 
device; 

Fig. 8A shows the operational state (calculation function) 
and input data for the arithmetic unit when the first 
processing from Fig. 7 (Steps S210 to S212) is performed; 

Fig. 8B shows the operational state (calculation function) 
and input data for the arithmetic unit when the second 
processing from Fig. 7 ' (Steps S214 to S216) is performed; 

Fig. 8C shows the operational state (calculation 
function) and input data for the arithmetic unit when the 
third processing from Fig. 7 (Steps S217 to S219) is 
performed; 

Fig. 9A is a timechart showing a pipeline operation for the 
arithmetic unit when the first processing from Fig. 7 is 
performed; 

Fig. 9B is a timechart showing a pipeline operation for the 
arithmetic unit when the second processing from Fig. 7 is 
performed; 

Fig. 9C is a timechart showing a pipeline operation for the 
arithmetic unit when the third processing from Fig. 7 is 
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performed; 

Fig. 10 shows a calculation formula for Montgomery 
calculation used by the multi-word arithmetic device and 
examples of input data obtained by the multi-word arithmetic 
device from an external device; 

Figs. IIA and IIB shows a memory map for the memory when 
Montgomery calculation is performed by the multi-word 
arithmetic device; 

Fig. 12A shows the operating state and input data for the 
arithmetic unit when partial products with a same digit 
position are computed and accumulated for a first time in the 
Montgomery calculation of step 1; 

Fig. 12B shows the operating state and input data for the 
arithmetic unit when partial products with a same digit 
position are computed and accumulated for a second time 
onwards 'in the Montgomery calculation of step 1; 

Fig. 13 shows calculating procedure when the arithmetic unit 
executes the Montgomery calculation in step 1; 

Fig. 14A shows the operating state and input data for the 
arithmetic unit when partial products with a same digit 
position are computed and accumulated for a first time in the 
first half of the processing (BxP) in the Montgomery 
calculation of step 2; 

17 



Fig. 14B shows the operating state and input data for the 
arithmetic unit when partial products with a same digit 
position are computed and accumulated for a second time 
onwards in the first half of the processing (BxP) in the 
Montgomery calculation of step 2; 

Fig. 14C shows the operating state and input data for the 
arithmetic unit when the second half of the processing (BxP) 
in the Montgomery calculation of step 2 (adding A to the 
processing result of the first half of the processing BxP) is 
executed; 

Fig. 15 Shows a calculating procedure when the arithmetic 
unit executes the Montgomery calculation in step 2; 

Fig. 16A shows the operating state and input data for the 
arithmetic unit when the first half of the processing for the 
Montgomery calculation in step 3 (M+Q or N.Q) is performed; 

Fig. 16B Shows the operating state and input data for the 
arithmetic unit when the second half of processing for the 
Montgomery calculation in step 3 (MtP or N+P) is performed; 

Fig. 17 is a circuit showing a construction for an 
arithmetic unit in an alternative embodiment having a subtract 
function; and 

Fig. 18A shows a circuit construction for a sign inverter 
in an arithmetic unit of an alternative embodiment; and 
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Fig. 18B shows operations performed by the sign inverter in 
the arithmetic unit in the alternative embodiment. 

PREFERRED EMBODIMENT 

The following is an explanation of an embodiment of the 
invention, with reference to the drawings. 

Fig. 1 is a block diagram showing a circuit construction 
for a multi-word arithmetic device 100 in the invention. The 
multi-word arithmetic device 100 is a coprocessor (LSI) 
selectively executing two types of multi-word arithmetic based 
on instructions (indicating computation type, length of multi- 
word integers to be computed and the like) from an external 
device (not shown) , the two types of multi-word arithmetic 
being modular addition of two five-word integers, and 
Montgomery reduction having an input of a ten-word integer. 
The multi-word arithmetic device 100 includes a control unit 
10 whose operation is synchronized with a clock signal 
generated internally, an arithmetic unit 20, a memory 
input/output unit 30 and a memory 40. 

Here, one word is equivalent to the length of data that can 
be computed during one clock cycle, in this case 32 bits. The 
external device is a CPU or similar, provided in a 
communication apparatus or the like that uses the multi-word 
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arithmetic device 100. 

Modular addition is addition modulo a constant P. 
Montgomery reduction is one algorithm used to perform high- 
speed modular arithmetic. Montgomery reduction includes a 
three-step calculation for finding M=A-R(-l)mod P for an input 
A approximate to P^2, when P and R are constants such that 
P<R=2'^m. This calculation is hereafter referred to as 
Montgomery calculation. More details may be found by 
referring to Ango, Zero Chishikishoumei ^ Suron (Cryptography ^ 
Zero Knowledge Interactive Proof,, Number Theory) by OKAMOTO 
Tatsuaki & OTA Kazuo, pub. Kyoritsu Shuppan 1995. 

Input: A (a value of approximately 2m bits) 

Precomputation: V=-P^(-l) mod R 

Output: M=A*R(-1) mod P 

Processing: 

Step 1: B=AxV mod R 

Step 2: M= (BxP+A) /R 

Step 3 : output M mod P 

During one clock cycle, the arithmetic unit 20 either 
multiplies two pieces of one-word data or adds three pieces of 
one-word data, according to instructions from the control unit 
10, and outputs 34-bit data including a piece of one-word data 
showing the result or part of the result of this calculation, 
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and a 2-bit carry. The arithmetic unit 20 is connected to the 
memory input/output unit 30 by three data buses 61 to 63 for 
outputting and one data bus 64 for inputting. 

The memory 4 0 temporarily stores integers on which multi- 
5 word arithmetic is performed by the multi-word arithmetic 
device 100, and intermediate data and calculation results 
generated by this calculation process. The memory 4 0 is 
formed from two separate dual-port memories, a first memory 41 
and a second memory 42, each of which can be accessed in word 
10 units, and is connected to the memory input/output unit 30 via 
four data buses 65 to 68 and four address buses 71 to 74. 

Each of the first and second memories 41 and 42 has a 
storage capacity of 256 words, and is capable of reading a 
piece of one-word data (partial integer) simultaneously from a 
15 maximum of two different storage areas via two input/output 
ports during one clock cycle. 

The memory input/output unit 30 is an interface circuit 
performing data transfer between the arithmetic unit 20 and 
the memory 40, and between an external device and the memory 
20 40, according to instructions from the control unit 10. 

The control unit 10 includes ROM for storing a control 
program, a logic circuit for outputting control signals 
according to this program, and RAM. The control unit 10 



perforins, for example, one of modular addition of two five- 
word integers stored in the memory 40 and Montgomery 
calculation on a ten-word integer, by controlling the 
arithmetic unit 2 0 and the memory input/output unit 30, based 
5 on instructions (indicating computation type, length of multi- 
word integers which are to be computed and the like) from an 
external device. 

Fig. 2 is a circuit drawing showing a detailed construction 

ri of the arithmetic unit 20 in Fig. 1. The arithmetic unit 20 

10 includes a multiplier 21, a three-input adder 22, a register 
23 and three selectors 24 to 26. The notation [n:ra] in the 
drawing indicates the nth to mth bits of a specified bit 

Q sequence, when the least significant bit is the 0th bit. 

The multiplier 21 multiplies two pieces of one-word data 

:15 transmitted from the memory input/output unit 30 via two data 
buses 61 and 62, and outputs the result of this multiplication 
as a piece of two-word data. 

The three-input adder 22 adds (a) a piece of two-word data 
input into a first input port in^ from the selector 24, (b) a 

20 piece of two-word data input into a second input port in2, the 
lower word being a piece of one-word data transmitted from the 
memory input/output unit 30 via the data bus 62, and the upper 
word being '0', (c) a piece of two-word data input into a third 



input port in^ from the selector 25, and (d) a 2-bit carry 
input into a carry input terminal (marked 'carry in' in the 
drawing) from the selector 26. The obtained 66-bit data (the 
upper 2 bits being a carry and the following bits a piece of 
5 two-word data) is output to the register 23. 

The three-input adder 22 can add negative numbers (numbers 
represented by a two's complement) , so that a carry can be 
output when there is an underflow (borrow) , and not just when 
there is an overflow. 

ID The register 23 stores the 66-bit data output from the 

three-input adder 22 for only one clock cycle. In the next 
clock cycle, the 66-bit data held in the register 23 is output 
as follows. An upper 2-bit carry [65:64] and a middle 2-bit 
carry [33:32] are transferred to the selector 26, and the 

15 lower two words of data are transferred to the selector 25, 

with the lower 34 bits being output to the memory input/output 
unit 30 via the data bus 64. 

The selector 24 selects, according to an instruction from 
the control unit 20, one of (i) a piece of two-word data 

20 produced by zero-extending the one-word data transmitted from 
the memory input/output unit 30 via the data bus 61, and (ii) 
a piece of two-word data output from the multiplier 21, and 
outputs the selected data to the first input port in. of the 



three-input adder 22. 

The selector 25 selects, according to an instruction from 
the control unit 10, one of (i) a piece of two-word data 
produced by zero-extending a piece of one-word data 
5 transmitted from the memory input/output unit 30, (ii) a piece 
of two-word data output from the register 23, and (iii) two- 
word data produced by zero-extending the upper word of the 
piece of two-word data output from the register 23, and 
outputs the selected data to the third input port in^ of the 
It) three-input adder 22. 

The selector 2 6 is a circuit for propagating a carry 
generated by the addition performed in a certain clock cycle 
O by the three-input adder 22 to the addition occurring in a 

next clock cycle. The selector 26 selects, according to 
15 instructions from the control unit 10, one of the 2-bit 

carries (i) [65:64] and (ii) [33:34] transmitted from the 
register 23, and transmits the selected carry to the carry 
input terminal of the three-input adder 22, 

Fig. 3 is a circuit drawing showing a detailed construction 
20 of the memory input/output unit 30 of Fig. 1. The memory 
input/output unit 30 has a bus switch 31, an input/output 
control unit 32 and an address generating unit 33. 

The bus switch 31 combines a plurality of selector 



circuits, and connects each of the four data buses 61 to 64 
connected to the arithmetic unit 20 to one of the four data 
buses 65 to 68 connected to the memory 40, according to 
instructions from the input/output unit 30. 
5 The address generating unit 33 includes four separate 

address registers and an incrementer, and generates four sets 
of access control signals (each containing an address signal, 
a read/write signal and the like) and outputs the four sets of 
\==% signals to four address buses 71 to 74, according to 

-io instructions from the input/output control unit 32. 

The input/output control unit 32 controls the bus switch 31 
and the address generating unit 33 based on instructions from 
i;3 the control unit 10 to perform the following operations. The 
□ arithmetic unit 20 performs a maximum of four separate 
l-o accesses of the memory 40 simultaneously. It also performs 

data transmission between a connected external device and the 
memory 40 via the data bus 69 and an address bus 75, and 
transfers to the control unit 10, as a carry signal, 
information relating to a carry transmitted from the 
20 arithmetic unit 20. 

The following is an explanation of the operation of the 
multi-word arithmetic device 100. 

Fig. 4 is a flowchart showing the general operating 



procedure for. the multi-word arithmetic device 100. 

First, the memory input/output unit 30 receives input data 
from the external device via the data bus 69 or the address 
bus 75, the input data being integers which are to be 
5 computed, integers resulting from precomputation and the like. 
Received integers are stored in a designated area in the 
memory 40 (step S200) . 

Next, the control unit 10 receives an instruction from the 
external device indicating which of modular addition and 
ID Montgomery calculation should be performed (step S2 01) . 

Upon receiving an instruction indicating that modular 
addition should be performed, the control unit 10 transmits 
preprogrammed control signals to the arithmetic unit 20 and 
the memory input/output unit 30, thereby having the arithmetic 
15 unit 20 execute modular addition on two five-word integers A 
and B stored in the memory 40, and having the result of this 
calculation C stored in the memory 4 0 (step S202) . 

Upon receiving an instruction indicating that Montgomery 
calculation should be performed, the control unit transmits 
20 preprogrammed control signals to the arithmetic unit 20 and 

the memory input/output unit 30, thereby having the arithmetic 
unit 20 execute steps 1 to 3 of the above-described Montgomery 
calculation in sequence, using an integer A stored in the 



memory 40, or similar, and having a final result M stored in 
the memory 40 (step S203 to 205) . 

Note that the modular addition result C and Montgomery 
calculation result M are read by the external device via the 
5 memory input/output unit 30. 

The following is an explanation of an actual example of 
computation performed by the multi-word arithmetic device 100. 

First, modular addition (C=A+B mod P) performed by the 
multi-word arithmetic device 100 is explained with reference 
1€ to Figs. 5 to 9. 

Fig. 5 shows a calculation formula for modular addition and 
examples of input data transferred to the multi-word 
arithmetic device 100 from the external device when modular 
addition is performed, in this case examples of input data A, 
15 B, P and Q stored in the memory 4 0 via the memory input /output 
unit 30. 

Integer A is one calculation object for modular addition, 
and is a five-word integer in which five words a^, a^, a2, a-j^ 
and aQ are arranged in sequence starting with the most 
20 significant digit (this kind of multi-word integer is 

hereafter written as [a^, a^, a^, a^, ap] or similar) . Integer 
B is another calculation object for modular addition, and is a 
five-word integer [b^, b3, bg, b^^, bg] . Integer P is a modulus 



used for modular addition, and is a five-word integer [p^, p^, 
P2/ p-^r Pq^ • Integer Q is a five-word integer [q^, q^, ci2r <ii, 
qg] equal to a value -P produced by inverting the sign for 
integer P. 

5 Fig. 6 shows a memory map of the memory 40 when modular 

addition is performed by the multi-word arithmetic device 100. 
Here, the above four pieces of input data A, B, P and Q are 
shown along with an five-word integer C [c^, C3, C2, c^, Cq] 
for storing the calculation result and intermediate data W [w^, 

10 W3, W2, w^, Wq] generated by the modular addition. 

I' The first memory 41 stores integers A, P and Q, and the 

second memory 42 stores integers B and C and intermediate data 
W. A memory map like the one in the drawing enables the 
arithmetic unit to simultaneously transfer two words selected 

15 from the integers A, P and Q, and two words selected from the 
integers B, C and W, during one clock cycle. 

Fig. 7 is a flowchart showing the operating procedure by 
which the multi-word arithmetic device 100 executes modular 
addition, in other words the detailed procedure for step S202 

20 in Fig. 4. 

The modular addition performed by the multi-word arithmetic 
device 100 can be broadly divided into three processes. In a 
first process, modular addition of an individual word is 



repeated five times (steps S210 to S212) . In a second 
process;- modular addition for an individual word is repeated 
five times (a recovery operation) when a carry has been 
generated by the first process. (steps S214 to S216) . In a 
third process, data transmission for substituting the 
intermediate data W into the calculation result C is repeated 
five times, when a carry has not been generated by the first 
process (steps S217 to S219) . 

Figs. 8A to 8C show the operating state (calculation 
function) and input data for the arithmetic unit 20 for the 
first process (step S210 to S216) , second process (step S217 
to S219) and third process (step S217 to S219) of Fig. 7 
respectively. 

In the first process, the arithmetic unit 20 operates as a 
one-word three-input adder, adding three pieces of data a^^, 
and q^, and substituting the result of the addition into a 
piece of data Wj_. In the second process, the arithmetic unit 
20 operates as a one-word two-input adder, adding two pieces 
of data p^ and w^ and substituting the result of the addition 
into a piece of data c^. In the third process, the arithmetic 
unit operates as a one-word data transfer unit, substituting 
the piece of data w^ into the piece of data c^. 

The operating state of the arithmetic unit 20 is determined 
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by control signals output to the arithmetic unit 20 from the 
control unit 10. The input data for the arithmetic unit 20 is 
determined by control signals output to the memory 
input/output unit 30 from the control unit 10. Moreover, the 
output of a fixed value '0' to one of the input ports of the 
three-input adder 22 is realized by controlling the selectors 
24 and 25 or the memory input/output unit 30 to output a piece 
of data that contains '0' in all its bit positions. 

Figs. 9A to 9C are timecharts showing pipeline processing 
performed by the arithmetic unit 20 for the first process 
(steps S210 to S212), the second process (steps S214 to 216) 
and the third process (steps S217 to S219) respectively. The 
register 23 in the arithmetic unit 20 holds the output from 
the three-input adder 22, so that two stages of the pipeline, 
calculation performed by the three-input adder 22 and storage 
in the memory 40 of the previous calculation result obtained 
by the three-input adder 22, can be executed in parallel 
during one clock cycle. 

In the first process, as is shown in Fig. 7, the control 
unit 10 first transmits control signals to the arithmetic unit 
2 0 and the memory input /output unit 30, thereby putting the 
arithmetic unit 20 in the operating state shown in Fig. 8A. 
Next, the control unit 10 outputs an initializing control 
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signal to the arithmetic unit 20, thereby setting both a value 
Reg held in the register 23 and a carry Car {Reg [33:32]) at 
an initial value of '0' (step S210) . 

Then, the arithmetic unit 20, during each clock cycle, 

5 repeats in parallel (i) the operation for adding two pieces of 
data a^ and transmitted from the first memory 41 via the 
memory input /output unit 30, the piece of data b^^ transmitted 
from the second memory 42 and the carry Car generated during 
the previous calculation, and storing the result of the 

10 addition in the register 23, and (ii) the operation for 

writing a lower word from the held value Reg in the register 
23 into a storage area w^ in the second memory 42 (step S211) . 

This means that the arithmetic unit 20 repeats the pipeline 
processing as shown in Fig. 9A in the following way. During a 
lis first clock cycle, the arithmetic unit 20 adds three pieces of 
data aQ, bg and qg, and stores the result of this addition as 
the value Reg in the register 23. Then in a subsequent second 
clock cycle, the arithmetic unit 20 adds three pieces of data 
a^, b-j^ and q-j^ and a carry Car generated by the calculation in 

20 the first clock cycle, and stores the result as the value Reg 
in the register 23, while simultaneously writing the value Reg 
held in the register 23 as a result of the previous 
calculation in a storage area Wq in the second memory 42. 



The arithmetic unit 20 repeats calculation and storage of a 
calculation result in the second memory 42 five times in 
total, i.e for five words, under the control of the control 
unit 10 (steps S211 and S212) . As a result, the computation 
5 for W=A+B+Q, in other words W=A+B-P, is completed. 

Next, the control unit 10 determines whether a carry Car 
(here a borrow) has been generated by the addition in a fifth 
clock cycle (step S213) . If a carry Car has been generated, 
the control unit 10 has the arithmetic unit 20 execute the 
10 second process (steps S214 to S216) , but if not, it has the 
LI; arithmetic unit 20 execute the third process (steps S217 to 

:=l S219). 

The reason for this is that, if the intermediate data W 
obtained in the first process is a negative value, the final 
15 result C (A+B mod P) is obtained by adding the modulus P to 
the intermediate data W (value for performing recovery 
operation). If, however, the intermediate data W is a positive 
value, this piece of data is used directly as the final result 
C. 

20 In the second process, the control unit 10 first transmits 

control signals to the arithmetic unit 20 and the memory 
input/output unit 30, thereby putting the arithmetic unit 20 
in the operating state shown in Fig. 8B. Next, the control 



unit 10 outputs an initializing control signal to the 
arithmetic unit 20, thereby setting both a value Reg held in 
the register 23 and a carry Car {Reg [33:32]) at an initial 
value of '0' (step S214) . 
5 Then, the arithmetic unit 20, during each clock cycle, 

repeats in parallel (i) the operation for adding a piece of 
data p^ and a piece of data Wj_^ transmitted via the memory 
input/output unit 30 from the first memory 41 and the second 
1 memory 42 respectively, to the carry Car generated during the 
IID previous calculation, and storing the result of the addition 
: as the value Reg in the register 23, and (ii) the operation 

for writing a lower word from the value Reg held in the 
3 register 23 into a storage area c^ in the second memory 42 
(step S215) . 

15 This means that the arithmetic unit 20 repeats the pipeline 

processing as shown in Fig. 9B in the following way. During a 
first clock cycle, the arithmetic unit 20 adds two pieces of 
data Pq and Wq, and stores the result of - this addition as the 
value Reg in the register 23. Then in a subsequent second 

20 clock cycle, the arithmetic unit 20 adds the two pieces of 

data p^ and w^ and a carry Car generated by the computation in 
the first clock cycle, and stores the result as the value Reg 
in the register 23, while simultaneously writing the value Reg 



held in the register 23 as a result of the previous 
calculation in a storage area Cq in the second memory 42. 

The arithmetic unit 20 repeats calculation and storage of 
a calculation result in the second faemory 42 five times in 
total, i.e for five words, under the control of the control 
unit 10 (steps S215 to S216) . As a result, the computation 
for C=W+P, in other words C=A+B mod P, is completed. 

In the third process, the control unit 10 transmits control 
signals to the arithmetic unit 2 0 and the memory input/output 
unit 30, thereby initializing the arithmetic unit 20 so that 
it is in the operating state shown in Fig. 8C (step S217) . 

Then, the arithmetic unit 20, during each clock cycle, 
repeats in parallel (i) the operation for storing the piece of 
data w^ transmitted from the first memory 42 directly in the 
register 23, and (ii) the operation for writing a lower word 
from the value Reg held in the register 23 into the storage 
area c^ in the second memory 42 (step S218). 

This means that the arithmetic unit 20 repeats the pipeline 
processing as shown in Fig. 9C in the following way. During a 
first clock cycle, the arithmetic unit 20 stores the piece of 
data Wq in the register 23 as it is. Then in a subsequent 
second clock cycle, the arithmetic unit 20 stores the piece of 
data Wj^ as the value Reg in the register 23, while 
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simultaneously writing the value Reg held in the register 23 
from the previous cycle into the storage area Cq in the second 
memory 42. 

The arithmetic unit 20 repeats data transfer five times in 
total, i.e for five words, under the control of the control 
unit 10 (steps S218 and S219) . As a result, the computation 
for C=W, in other words C=A+B mod P, is completed. 

Using the above processing method, the multi-word 
arithmetic device 100 can complete modular addition of five 
words during just ten clock cycles, despite being equipped 
with a small arithmetic unit 20 which is only capable of 
performing calculation on one word during each clock cycle. 
Moreover, if no carry has been generated upon completion of 
the first process, a result W for the modular addition of five 
words can be obtained after only five clock cycles. 

The following is an explanation of the operating procedure 
used when Montgomery calculation (M=A'R'^(-1) mod P) is 
executed by the multi-word arithmetic device 100, with 
reference to Figs. 10 to 16. 

Fig. 10 shows a Montgomery calculation algorithm and 
examples of input data transmitted to the multi-word 
arithmetic' device 100 from the external device when Montgomery 
calculation is performed, in other words input data A, P and V 

35 



stored in the memory 40 via the memory input/output unit 30. 

The integer A is data on which Montgomery calculation is 
performed, and consists of a ten-word integer [a^, ag"*a^, slq] . 
The integer P is a modulus used in modular arithmetic and is a 
five-word integer [p^, p^, P2, p^, Pq] ■ The integer Q is a 
five-word integer [q^, q^, ^i' produced by inverting 

the sign of the integer P (-P) - The integer V is a five-word 
integer [v^, V3, Vg, v-^, Vq] forming a calculation result for 
the above-mentioned precomputation performed by the external 
device. 

Fig. 11 shows a memory map for the memory 4 0 when 
Montgomery calculation is performed by the multi-word 
arithmetic device 100. Here, five-word intermediate data B 

[b^, bg, h^, b^, bg] generated by calculation processing, six- 
word intermediate data C [C3, c^, C3, C2, c^, Cq] , a one-word 
fixed value E [Cq] required for the calculation processing 

(Oxffffffff; a word containing all ones) and five-word 
integers M [m^, m3, m^r m^, m^] and N [n^, n^, n2, n^, Uq] for 
storing the final result of the Montgomery calculation are 
shown in addition to the four pieces of input data A, P, Q and 
V. 

Integers A, P, Q and M are stored in the first memory 41, 
and integer V, intermediate data B and C, fixed value E and 

36 



integer N in the second memory 42. Using this kind of memory- 
map, the arithmetic unit 20 can simultaneously transfer two 
words selected from the four pieces of data A, P, Q and M and 
two words selected from two of the three pieces of data V, B, 
5 C and E, during one clock cycle. 

Step 1 

The following is a detailed explanation of operations 
executed in step 1 of the Montgomery calculation performed by 

1-0 the multi-word arithmetic device 100, in other words step S203 
in Fig. 4, with reference to Figs 12A, 12B and 13. 

Figs. 12A and 12B show the operating state and input data 
for the arithmetic unit 2 0 when step 1 of the Montgomery 

=3 calculation is executed. The arithmetic unit 20 multiplies 

15 each word forming the integer A with each word forming 
the integer V, obtaining partial products with a same digit 
position (in this case, one digit is equivalent to one word) 
which it then accumulates (totals) , and substitutes the 
cumulative result into the integer B. 

20 Fig. 12A shows the operating state of the arithmetic unit 

20 when a first addition is performed for accumulating partial 
products with a same digit position. Here, the selector 25 in 
the arithmetic unit 20 selects a piece of two-word data, by 



zero-extending the upper word of a piece of two-word data 
output from the register 23. This operation is performed to 
add the upper word of a two-word cumulative value, obtained by- 
accumulating partial products with a same digit position, to a 
sum of its upper partial products, in other words to a sum of 
the partial products that are positioned shifted one word to 
the left of the originally accumulated partial products. 

Fig. 12B shows the operating state of the arithmetic unit 
20 when addition of cumulative values for partial products 
with the same digit position is performed for the second time 
onwards. Here, the selector 25 in the arithmetic unit 20 
selects a piece of two-word data output from the register 23. 

Fig. 13 shows the. calculating procedure when step 1 of the 
Montgomery calculation is executed by the arithmetic unit 20. 
The upper part of the drawing shows the integers A [a^, 33, 3-2, 
a.-^, ag] and V [v^, V3, V2, v^, Vq] on which multiplication is 
performed, the central part shows partial products arranged in 
order of calculation and the lower part is a representation of 
a process in which a sum of partial products having a same 
digit position is substituted into a word in the integer B [b^, 

b3, b^, bg]. 

The reason for multiplying only the lower five words of the 
ten-word integer A is that, as shown in Fig. 10, step 1 of the 
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Montgomery calculation only needs to compute a residue for the 
integer R (mod R) . 

The actual operation performed in step 1 by the arithmetic 
unit 20 is as follows. 

First, the control unit 10 initializes the arithmetic unit 
20 by transmitting control signals to the arithmetic unit 20 
and the memory input/output unit 30. 

In a first clock cycle, after a control signal from the 
control unit 10 puts the arithmetic unit 20 in the operating 
state shown in Fig. 12A, the arithmetic unit 20 uses the 
multiplier 21 to multiply a piece of data and a piece of 
data Vq transmitted via the memory input/output unit 30 from 
the first memory 41 and the second memory 42 respectively, and 
stores the result of the multiplication in the register 23. 

In a second clock cycle, the arithmetic unit 20 uses the 
multiplier 21 to multiply a piece of data a^ and a piece of 
data Vq, transmitted from the first memory 41 and the second 
memory 42 respectively, adds the result of this multiplication 
to a value obtained by downshifting the multiplication result 
obtained in the first clock cycle by one word, and stores the 
result of the addition in the register 23. Simultaneously, 
the arithmetic unit 20 writes the lower word of the 
multiplication result from the first clock cycle held in the 
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register 23 into a storage area bp in the second memory 42. 

In a third clock cycle, after being put in the operating 
state shown in Fig. 12B by a control signal transmitted from 
the control unit 10, the arithmetic unit 20 uses the 

5 multiplier 21 to multiply the piece of data and a piece of 
data transmitted from the first memory 41 and the second 
memory 42 respectively, adds the result of this multiplication 
to the two-word cumulative value stored in the register 23 and 
stores the result of the addition in the register 23. 
=10 In a fourth clock cycle, after being put in the operating 

state shown in Fig. 12A by a control signal transmitted from 
the control unit 10, the arithmetic unit 20 uses the 
multiplier 21 to multiply a piece of data a2 and the piece of 
data Vq transmitted from the first memory 41 and the second 

15 memory 42 respectively, adds the result of this multiplication 
to a value obtained by downshifting the multiplication result 
from the third clock cycle by one word, and stores the result 
of the addition in the register 23. Simultaneously, the 
arithmetic unit 20 writes a lower word from the multiplication 

20 result of the third clock cycle held in the register 23 into a 
storage area b^ in the second memory 42. 

Subsequently, the arithmetic unit 20 repeats calculation of 
and accumulation of partial products with the same digit 



position, for all combinations of data a^ and Vj where the sum 
of ^ and j is no greater than 4, and stores the results of 
these calculations in the storage areas IOq, b^, and b^ . 

This completes the processing for step 1. The upper five 
words remaining in the register 23 after the multiplication 
and accumulation in the fifteenth clock cycle have been 
completed are rounded down. 

Step 2 

The following is a detailed explanation of step 2 of the 
Montgomery calculation performed by the multi-word arithmetic 
device 100, in other words step S204 in Fig. 4, with reference 
to Figs. 14A, 14B, 14C and 15. 

Figs. 14A and 14B show the operating state and input data 
for the arithmetic unit 20 when the first half of the 
processing (BxP) for step 2 of the Montgomery calculation is 
executed. The arithmetic unit 20 multiplies each word b^ of. 
the integer B obtained in step 1 with each word Pj of the 
integer P, while accumulating the partial products with the 
same digit position obtained from this process and 
substituting the upper six words of the cumulative result into 
the integer C. 

Fig. 14A shows the operating state of the arithmetic unit 
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20 when a first addition of a cumulative value for partial 
products with the same digit position is performed. Fig. 14B 
shows the operating state of the arithmetic unit 20 when 
addition of cumulative values for partial products with a same 
5 digit position is performed for the second time onwards. 

Fig. 14C shows the operating state and the input data for 
the arithmetic, unit 20 when the second half of the processing 
(addition of the result of the first half of the processing 
Bxp and integer A) in step 2 of the Montgomery calculation is 

10 executed. The arithmetic unit 20 adds the integer C obtained 
from the first half processing, the one-word fixed integer E 
and the upper six words of the integer A, and substitutes the 
upper five words of this addition result into the integer M. 
Fig. 15 shows the calculating procedure when step 2 of the 
lib Montgomery calculation is executed by the arithmetic unit 20. 

The upper part of the drawing shows the integer B [b^, bj, b2, 
b^, bp] and the integer P [p^, P3, Pg, P-^, Pq] ori which 
multiplication for the first half of the processing is 
performed. Partial products are arranged in order of 

20 calculation from top to bottom in the central part of the 

drawing. The lower part of the drawing is a representation of 
a process in which an upper six words of cumulative results 
for partial products with the same digit position are 



substituted into each word of an integer C [C^, c^, c^, c^, 
Cq] and the integer C, the integer E and the upper six words of 
the integer A are added, the result of the addition being 
substituted into the upper five words of the integer M. 
5 Note that the reason for storing only the upper five words 

from the result of the above multiplication and addition 
(BxP+A) in the integer M is that the relation Bxp+A mod R = 0, 
makes it clear that the lower half of the calculation result 
3 (BxP+A) , i.e. the lower five words, must be all zeros. 

lO Therefore, in step 2, the required calculation is executed 

z focusing only on the upper five words of the calculation 

result. However, since a carry from the sixth word (the sixth 
from the most significant digit, other words referred to below 
3 also being so defined) to the fifth word is considered when 
^5 computing (B^p+A) , the multiplication of integers B and P and 
the addition of integer A are performed on the upper six words 
of the integer. 

Furthermore, a word containing all ones is also added when 
performing additions for the sixth word. This enables any 
20 carry propagated to the fifth word from the seventh word via 

the sixth word to be considered when computing (BxP+A) . Since 
it has been ascertained, as described above, that the sixth 
word for the calculation (Bxp+A) must be '0', the carry from the 



seventh word only needs to be considered if the result of 
adding the data Cq and the data a^ not '0'. If the result of 
adding the data Cq and the data a^ is '0', there is no need to 
check for a carry, as any carry can be propagated simply by 
5 adding the integer E (eg) . 

Note that incorporating the addition of the data e^, having 
ones in all its bit positions, in the addition of the data Cq 
and the data a^ is equivalent to performing one of the 
following processing (1) to (4) . 
IQ (1) When the addition result of data Cq and data a^ is '0', 

;:: and the carry is also '0', a carry '0' is added to the computed 

data vciq (c-j^+a^) . 

(2) When the addition result of data Cq and data a^ is '0', 
but the carry is '1', a carry '1' is added to the computed data 

15 mQ (c^+a^) . 

(3) When the addition result of data Cq and data is not 
'0', but the carry is '0', a carry 'V is added to the computed 
data mQ (c-j^+a^) . 

(4) When the addition result of data Cq and data a^ is not 
20 '0', and the carry is '1', a carry '2' is added to the computed 

data mQ (c^+a^) . 

The following is an explanation of the actual operation 
performed by the arithmetic unit 20 in step 2. 



In a first clock cycle, after being put in the operating 
state shown in Fig. 14A by a control signal transmitted from 
the control unit 10, the arithmetic unit 20 uses the 
multiplier 21 to multiply two pieces of data and Pq, 

5 transmitted via the memory input /output unit 30 from the 

second memory 42 and the first memory 41 respectively, and 
stores the multiplication result in the register 23. 

In a second clock cycle, after being put in the operating 
state shown in Fig. 14B by a control signal transmitted from 

10 the control unit 10, the arithmetic unit 20 uses the 

multiplier 21 to multiply two pieces of data b2 and p^, 
transmitted from the second memory 42 and the first memory 41 
respectively, accumulates the obtained multiplication value 
with the value stored in the register 23 in the first cycle 

15 and stores the cumulative result in the register 23. 

Subsequently, the arithmetic unit 20 computes partial 
products (with the same digit position) for all combinations 
of b^ and p^ where the sum of ^ and ^ is 3, and accumulates the 
partial products (third and fourth clock cycles) . 

20 In a fifth clock cycle, after being put in the operating 

state shown in Fig. 14A by a control signal transmitted from 
the control unit 10, the arithmetic unit 20 uses the 
multiplier 21 to multiply two pieces of data b^ and Pq, 



transmitted via the memory input /output unit 30 from the 
second memory 42 and the first memory 41 respectively, adds 
the multiplication result and a value obtained by downshifting 
the value held in the register 23 by one word, and stores the 
5 result in the register 23. Simultaneously, the arithmetic 
unit 20 writes a lower word from the result of the 
multiplication and- accumulation from the fourth cycle held in 
the register 23 in a storage area Cq in the second memory 42. 
In a sixth clock cycle, after being put in the operating 

10 state shown in Fig. 14B by a control signal transmitted from 
the control unit 10, the arithmetic unit 20 uses the 
multiplier 21 to multiply two pieces of data bj and p^, 
transmitted from the second memory 42 and the first memory 41 

□ respectively, accumulates the obtained multiplication value 

15 with the value stored in the register 23 and stores the 
cumulative result in the register 23. 

Subsequently, the arithmetic unit 20 computes partial 
products for all combinations of b^^ and p^ where the sum of ^ 
and j is from 4 to 8, accumulates the partial products, and 

20 stores the results in the storage areas c^, o^i C3, c^ and c^ . 

Next, after being put in the operating state shown in Fig. 14C 
by a control signal transmitted from the control unit 10, the 
arithmetic unit 20 arranges the digits in integers C [c^, c^. 



C3, ^1' ^0^ ^ ^~ ' ~' ~' ~' ~' ^0^ transmitted from the 

second memory 42 and integer A [ag, ag, a^^ ag, a^, a^] into 
words, and performs addition of corresponding words from each 
of the integers, substituting each of the results into an 
5 integer M [m^, m^, m^, m^, -] in a first memory 41. 

This means that the arithmetic unit 2 0 adds the pieces of 
data Cq and a^ during the first clock cycle, adds the piece of 
data c-^, the piece of data a^ and a carry, and substitutes this 
;3 result into data m^ during the second clock cycle, and adds the 
10 piece of data c^, the piece of data ag and a carry, and 
::: substitutes this result into data ra^ in a third clock cycle. 
Subsequent processing is performed in a similar manner. 

This completes the processing for step 2. Note, that in 
step 2, calculation for partial products of integers B and P 
15 is not performed for partial products having complements whose 
sum is less than 2, for example b^^pQ, and b-^*pQ. This means 
that the processing time required to compute partial products 
in this invention is less than that for conventional 
processing in which all of the partial products are 
20 multiplied. 

Step 3 
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The following is a detailed explanation of the operations 
performed by the multi-word arithmetic device 100 in step 3 of 
the Montgomery calculation, in other words the processing in 
step S205 in Fig. 4. 

Figs. 16A and 16B show operating states and input data for 
the arithmetic unit 20 when step 3 of the Montgomery 
calculation is performed. The arithmetic unit 20 uses the 
first memory 41 (integer M) and the second memory 42 (integer 
N) alternately as temporary working areas (buffers) , and 
computes a residue of the integer M obtained in step 2 modulo 
an integer P (M mod P) , storing the result in integer M or 
integer N. 

Fig. 16A shows the operating state of the arithmetic unit 
20 when the first half of processing for step 3 is performed. 
In this first half, the arithmetic unit 20 alternates (i) 
addition of integer M and integer Q (= -P) , and substitution 
of the result into integer N, and (ii) addition of integer N 
and integer Q and substitution of the result into integer M, 
until an obtained integer M (or N) is negative. 

Fig. 16B shows the operating state of the arithmetic unit 
20 when the second half of processing for step 3 is performed. 
The arithmetic unit 20 adds the negative integer M (or N) 
obtained in the first half to integer P, and substitutes the 



result into the integer N (or M) . 

The following is an explanation of the actual processing 
performed by the arithmetic unit 20 in step 3. 

In a first clock cycle, after being put in the operating 
5 state shown in Fig. 16A by a control signal transmitted from 

the control unit 10, the arithmetic unit 20 adds two pieces of 
data xRq and q^, transmitted via the memory input/output unit 30 
from the first memory 41, and stores the addition result in 
;3 the register 23. 

10 In a second clock cycle, the arithmetic unit 20 adds two 

pieces of data m.^ and q^, transmitted from the first memory 41, 
and stores the result of the addition in the register 23, 

;!l whilst simultaneously storing a lower word from a value held 

in the register 23 from the first cycle in a storage area in 

15 the second memory 42. 

Repetition of addition and storage in this way changes the 
value of integer N in the second memory 42 to M+Q, in other 
words M-P. 

Next, the control unit 10 determines the code of the most- 
20 recently stored integer N, by receiving a carry generated from 
the last calculation performed in the above described addition 
from the memory input/output unit 30. If integer N is 
determined to be positive, each word of integer N is added to 



each word of integer Q and the result substituted into integer 
M, and the control unit 10 determines the code of this integer 
M. The two types of addition above (M+Q-N, N+Q-M) are 
alternated until integer M (or N) is negative. 

5 When a resulting integer M (or N) is negative, the control 

unit 10 transmits a control signal to the arithmetic unit 20 
via the memory input/output unit 30, thereby setting the 
operating state of the arithmetic unit 20 to that shown in 
Fig. 16B. Then, the arithmetic unit 20 adds integer M (or N) 
iO and integer P, and substitutes the result into integer N (or 
M) by repeating addition and storage for each word, in the 
same way as in the first half. 

Thus, the residue of integer M modulo integer P (M mod P) , 

~- in other words the final result of the Montgomery calculation, 

15 is stored in the integer M in the first memory 41 or in the 
integer N in the second memory 42, completing step 3. 

In this way, the multi-word arithmetic device 100 can 
execute two types of multi-word arithmetic, modular addition 
and Montgomery calculation, required for elliptic curve 

20 cryptology and the like, despite being provided with just one 
arithmetic unit 20. 

Furthermore, the two-word multiplication and the three-word 
addition performed respectively by the multiplier 21 and the 



three-input adder 22, and the storing of a previous 
multiplication and addition result in the memory 40 can be 
executed in parallel as different stages in a pipeline. This 
enables multi-word arithmetic to be performed at high speed. 
5 The multi-word arithmetic device 100 of the present 

invention has been described based on the embodiment, but the 
limitations set out thus far need not apply. 

For example, the multi-word arithmetic device 100 in this 
invention performs multi-word arithmetic on five-word 
10 integers, and the arithmetic unit 20 uses 32-bit word units, 
but the invention need not be limited to these numerical 
values. 

Furthermore, the multi-word arithmetic device 100 subtracts 
a modulus P from a given integer by using a method which 

35 involves adding an integer Q (= -P) already obtained from an 
external device, but a method in which the modulus P is 
subtracted directly may be used. 

Fig. 17 is a circuit showing a construction of an 
arithmetic unit 50 in a modification of the invention enabling 

20 the modulus P to be subtracted directly. The arithmetic unit 
50 has a similar construction to the arithmetic unit 20 in the 
embodiment, into which a sign inverting unit 51 has been 
inserted immediately prior to the second input port in2 of the 



three-input adder 22. The sign inverting unit 51 is capable 
of inverting signs for n-word integers, and has a circuit 
construction and operating function shown respectively in 
Figs. 18A and 18B. This means that, when a least significant 
5 word for an n-word integer is input, the sign inverting unit 

51 inverts each bit of the word and then adds '1' to the result 
before outputting it. When a higher word is input, the sign 
inverting unit 51 inverts each bit of the word and outputs it. 
I Inputting each word of integer P consecutively into such a 

^-iO sign inverting unit 51 has the same effect as inputting each 
;: word of integer Q (= -P) consecutively into the second input 
=^ port in2 of the three-input adder 22. As a result, using this 
3 arithmetic unit 50 instead of the arithmetic unit 20 makes the 

3 processing in which integer Q is generated beforehand by an 
|5 external device and passed to the multi-word arithmetic device 
100 unnecessary. 

Furthermore, the multi-word arithmetic device 100 includes 
a memory input/output unit 30 for transferring data between 
the arithmetic unit 20 and the memory 40 and between an 
20 external device and the memory 40, but the invention need not 
have such a limitation. These two types of data transfer may 
be performed by an external device and another data transfer 
circuit or similar, rather than by including the memory 



input/output unit 30 in the multi-word arithmetic device 100. 
Alternatively, the two types of data transfer may be performed 
by separate circuits, included in each of the arithmetic unit 
2 0 and the memory 40. 
5 Here, first and second memories 41 and 42 are each dual- 

port memories on which two separate accesses can be performed 
during one clock cycle. Alternatively, single-port memories 
operated by a clock signal provided at double the frequency 
may be used. 

10 The multi-word arithmetic device 100, in step 2 of the 

:!^: Montgomery calculation, adds six-word intermediate data C, the 
upper five words of the integer A and the one-word integer E, 
computing the five-word integer M. Alternatively, an integer 
AA may be taken as the upper (n+1) words of the integer A, and 
15 the following four values added: 

(i) a carry generated when the least significant word of each 
of intermediate data C and integer AA are added together; 

(ii) a 1-bit logical value that is '0' when the result of the 
addition (i) is '0' and '1' when the result is not '0'; 

20 (ill) the upper n words of the intermediate data C; and 
(iv) the upper n words of the integer AA. 

This enables the multi-word arithmetic device 100 to complete 
step 2 of the Montgomery calculation without needing to obtain 



integer E from an external device. 

Furthermore, the multi-word arithmetic device 100 completes 
step 3 of the Montgomery calculation by storing a final result 
in one of the first memory 41 (integer M) and the second 

5 memory 42 (integer N) . Alternatively, a processing similar to 
the third process in the modular addition may be added, so 
that an integer N in which the final result is stored is 
transferred to integer M. This ensures that the final result 
of the Montgomery calculation will be stored in the integer M. 

10 In the arithmetic unit 20, multiplication by the multiplier 

21 and accumulation by the three-input adder 22 are described 
as being performed during the same clock cycle, but a register 
may be provided between the multiplier 21 and the three-input 
adder 22, so that multiplication and addition are performed 

15 during two clock cycles. In other words, the pipeline of the 
arithmetic unit 20 may be divided into three stages 
(multiplication, addition, and writing into the memory 40) . 
This reduces the maximum burden generated by the pipeline 
processing during a single clock cycle, and shortens its 

20 critical path, enabling the operating frequency of the 
arithmetic unit 20 to be raised. 

When performing Montgomery calculation in the embodiment, 
the multi-word arithmetic device 100 selects sets of word 



pairs, each set formed from all the pairs of words that 
generate a partial product with a same digit position, sets 
input values in the multiplier, and adds the result of a 
multiplication to. an accumulated value stored in the register 
5 23. Alternatively, however, the result of the multiplication 
may be added to an accumulated partial product value via the 
memory 40. 

In this case, the memory 40 is already provided with an 
area for storing an accumulated value. The multi-word 

10 arithmetic device 100 may update accumulated values by (a) 

calculating a partial product, while simultaneously reading a 
one-word accumulated value from the memory 40, (b) adding the 
one-word accumulated value to a corresponding word in the 
partial product, and (c) storing the addition result in the 

15 corresponding area in the memory 40. This enables selection 
of the pairs of data to be multiplied to be performed with 
greater flexibility. 

Although the present invention has been fully described by 
way of examples with reference to accompanying drawings, it 

20 is to be noted that various changes and modifications will be 
apparent to those skilled in the art. Therefore, unless such 
changes and modifications depart from the scope of the 
present invention, they should be construed as being included 



therein . 
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CLAIMS 



What is claimed is: 



1 1. A multi-word arithmetic device for executing modular 

2 arithmetic on multi-word integers, in accordance with 

3 instructions from an external device, the multi-word 

4 arithmetic device comprising: 

5 a memory; 

6 an arithmetic unit for executing, on word units, at least 
ij|7 two types of calculation, including addition and 

8 multiplication, and outputting a one-word calculation result; 

9 a memory input/output circuit for performing (1) a first 

10 data transfer for storing in the memory at least one integer 

11 received from an external device, (2) a second data transfer 

12 for inputting at least one integer stored in the memory into 

13 the arithmetic unit in word units, (3) a third data transfer 

14 for storing in the memory the calculation result output from 

15 the arithmetic unit, and (4) a fourth data transfer for 

16 outputting the calculation result from the memory to the 

17 external device; and 

18 a control circuit for, according to instructions received 

19 from the external device, 

20 (a) specifying, to the memory input/output unit, data to 
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21 be transferred by the second and third data transfers, and 

22 (b) specifying, to the arithmetic unit, a type of 

23 calculation to be executed, 

24 thereby controlling: 

25 (i) the arithmetic unit to selectively perform one of at 

26 least two types of modular arithmetic on the at least one 

27 integer stored in the memory; and 

28 (ii) the memory input /output circuit to store the 

29 calculation result of the modular arithmetic into the memory. 

1 2. The multi-word arithmetic device of Claim 1, wherein 

'=|2 at least two integers are stored in the memory, 

3 the arithmetic unit includes: 

=4 an adder for adding at least two pieces of one-word data; 

5 and 

6 a multiplier for multiplying at least two pieces of one- 

7 word data, and 

8 the memory input/output circuit simultaneously reads one 

9 word from each of the at least two integers stored in the 

10 memory, and outputs the read words to one of the adder and the 

11 multiplier. 

1 3. The multi-word arithmetic device of Claim 2, wherein: 
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2 the memory is divided into two dual-port memories, each 

3 allowing access to two storage areas designated by two 

4 addresses, and allowing (1) two read operations, or (2) one 

5 read operation and one write operation to be performed 

6 simultaneously on word units; and 

7 the at least two integers are stored in each dual-port 

8 memory so that the memory input/output circuit can 

9 simultaneously (1) read a piece of one-word data 

=40 simultaneously from each of the integers stored in the two 

11 dual-port memories, and have the read pieces of data input 

12 into one of the adder and the multiplier, and (2) write a 
=13 piece of one-word data output from one of the adder and the 
M multiplier into one of the two dual-port memories. 

_1 4. The multi-word arithmetic device of Claim 1, wherein 

2 the arithmetic unit, according to instructions from the 

3 control circuit, executes one of the following three 

4 calculations: (1) addition of at least two pieces of one-word 

5 data; (2) multiplication of two pieces of one-word data; and 

6 (3) multiplication of two pieces of one-word data and 

7 accumulation of multiplication results. 

1 5. The multi-word arithmetic device of Claim 4, wherein 

59 



2 the arithmetic unit includes: 

3 a multiplier receiving an input of two pieces of one-word 

4 data and outputting a piece of two-word data; 

5 an adder receiving an input of at least two pieces of two- 

6 word data, including a piece of two-word data output from the 

7 multiplier, and outputting a piece of multi-word data; and 

8 a selecting circuit selecting, according to instructions 

9 from the control circuit: 

10 (1), data to be input into one of the multiplier and the 

11 adder out of data transmitted from the memory input/output 

12 circuit; and 

-13 (2) data to be output as the calculation result out of data 

14 output from one of the adder and the multiplier. 

1 6, The multi-word arithmetic device of Claim 1, wherein 

2 the at least two types of modular arithmetic include modular 

3 addition, and 

4 on receiving, from the external device, an instruction to 

5 execute modular addition and an indication of a number of 

6 words n for each integer on which modular addition is to be 

7 performed, the control circuit controls the memory 

8 input/output circuit and the arithmetic unit to execute the 

9 following processing: 
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10 (1) the memory input /output circuit obtains from the 

11 external device and stores in the memory two n-word integers A 

12 and B on which modular addition is to be executed and a n-word 

13 integer P showing a modulus; 

14 (2) the memory input/output circuit (a) reads 

15 simultaneously, from the integers A, B and P stored in the 

16 memory, pieces of one-word data a, b and p, each with a same 

17 digit position, and has the read pieces of data input into the 

18 arithmetic unit, while (b) storing in the memory a piece of 

19 one-word data w output from the arithmetic unit, and repeats 

20 processes (a) and (b) sequentially from a lowest-order word in 

21 each integer until n words of data are obtained, enabling an 

22 n-word integer W to be stored in the memory; and 

23 (3) the arithmetic unit repeats n times a process in which 

24 the pieces of data a, b and p received from the memory 

25 input/output circuit are computed as a + b - p, propagating a 

26 carry, and a result w is output. 

1 7. The multi-word arithmetic device of Claim 6, wherein 

2 the control circuit determines whether a carry has been 

3 generated by the arithmetic unit immediately after completion 

4 of the processing (1) to (3) and if a carry has been 

5 generated, further controls the memory input/output circuit 
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6 and the adder to execute the following processing: 

7 (4) the memory input /output circuit (a) reads 

8 simultaneously, from the integers W and P stored in the 

9 memory, pieces of one-word data w and p, each with a same 

10 digit position, and has the read pieces of data input into the 

11 arithmetic unit, while (b) storing in the memory a piece of 

12 one-word data c output from the arithmetic unit and repeats 

13 processes (a) and (b) sequentially from a lowest-order word in 

14 each integer until n words of data are obtained, enabling an 

15 n-word integer C to be stored in the memory; and 

■|6 (5) the arithmetic unit repeats n times a process in which 

17 the pieces of data w and p received from the memory 

18 input/output circuit are computed as w + p, propagating a 

19 carry, and a result c is output. 

1 8. The multi-word arithmetic unit of Claim 1, wherein the 

2 at least two types of modular arithmetic include Montgomery 

3 reduction calculating a residue for A'R^(-l) mod P, when each 

4 word has k bits, A is a 2n-word integer used for input data, R 

5 is an integer 2^ (kx-n) and P is an n-word integer; and 

6 upon receiving, from the external device, an instruction to 

7 execute Montgomery reduction and an indication of a number of 

8 words 2n for an integer A on which Montgomery reduction is to 
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9 be performed, the control circuit controls the memory 

10 input/output circuit and the arithmetic unit to execute 

, 11 Montgomery reduction. 

1 9. The multi-word arithmetic device of Claim 8, wherein, 

2 when receiving an instruction to execute Montgomery reduction 

3 from the external device, the control circuit controls the 

4 memory input/output circuit and the arithmetic unit so as to 

5 execute the following processing: 

'~6 (1) the memory input/output circuit acquires integers A, P 

'7 and V from the external device and stores the obtained 

8 integers in the memory, the integer V being -P'^ (-1) mod R; 

9 (2) the arithmetic unit computes partial products for words 

10 from each of (i) a lower n words of the integer A stored in 

11 the memory, and (ii) the integer V, and accumulates words in 

12 partial products having a same digit position, repeating the 

13 process sequentially from a lowest word in each integer until 

14 n words of accumulated results are obtained, and storing the 

15 accumulated results in the memory as a piece of n-word 

16 intermediate data B; 

17 (3) the arithmetic unit computes partial products for words 

18 from each of (a) the piece of intermediate data B and (b) the 

19 integer P stored in the memory, and accumulates words in the 
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20 partial products having a same digit position so that, when a 

21 lowest word is a 0th word, accumulated results for a 0th to 

22 (n-3)th word are not obtained, but accumulated results for a 

23 (n-2)th word to a (2r!-l)th word are obtained and stored in the 

24 memory as the upper (n+1) words of a piece of intermediate 

25 data D; 

26 (4) the arithmetic unit (a) generates (i) a carry obtained 

27 from a one-word addition performed by adding a lowest word 

28 from each of the piece of intermediate data D and an integer 

29 AA, and (ii) a one-bit logical value, the integer RA being an 
""30 upper (n+l) words of the integer A, and the one-bit logical 

il value being 0 when a one-word addition result is 0, and 1 when 

"32 the one-word addition result is not 0, and (b) adds an upper n 

==33 words of the piece of intermediate data D, an upper n words of 

_34 the integer AA, the carry and the one-bit logical value, by 

35 repeating addition of word units sequentially from a lowest 

36 word in each integer, while propagating a carry, until n words 

37 of data are obtained, and stores an addition result in the 

38 memory as a piece of n-word output data M; and 

39 (5) when the output data M stored in the memory is at least 

40 as large as the integer P, the arithmetic unit subtracts the 

41 integer P from the output data M until the output data M is 0 

42 or a positive integer smaller than the integer P, by repeating 
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43 subtraction of word units sequentially from a lowest word in 

44 each integer, while propagating a carry, until n words of data 

45 are obtained, and stores the subtraction results in the memory 

46 as a new piece of n-word output data ,M. 

1 10. The multi-word arithmetic device of Claim 9, wherein 

2 in processing (4), the arithmetic unit adds a piece of one- 

3 word data containing all ones to the piece of intermediate 
::|4 data D and the integer AA, and stores an upper n words of an 

5 obtained addition result in the memory as the output data M. 

L 11. The multi-word arithmetic device of Claim 10, wherein, 

2 in processing (2) and (3) , the arithmetic unit selects sets of 

3 word pairs, each set formed from all the pairs of words that 
:S4 generate a partial product with a same digit position, sets 

5 input values in the multiplier, and computes and accumulates 

6 the partial products for the selected pairs of words in 

7 sequence from the set with a lowest digit position, 

1 12. The multi-word arithmetic device of Claim 11, wherein, 

2 in processing (2) and (3), the arithmetic unit stores in the 

3 memory as part of a multiplication result a lower word from a 

4 two-word accumulated result obtained by accumulating partial 

65 



5 products with the same digit position, and adds an upper word 

6 from the accumulated result to partial products that have a 

7 digit position one word higher and are thus the next to be 

8 calculated. 

1 13. The multi-word arithmetic device of Claim 12, wherein 

2 the arithmetic unit performs an operation for storing a lower 

3 word from the accumulated result in the memory simultaneously 
-A with an operation for adding an upper word from the 

'5 accumulated result to partial products that have a digit 

6 position one word higher and are thus the next to be 

J calculated. 

1 14. The multi-word arithmetic device of Claim 10, wherein, 

'2 when computing and accumulating partial products in processing 

3 (2) and (3) , the arithmetic unit updates accumulated values by 

4 (a) simultaneously (i) computing a partial product and (ii) 

5 reading a previously accumulated one-word value from the 

6 memory, (b) adding the accumulated one-word value to a 

7 corresponding word in the partial product, and (c) storing a 

8 result of the addition in a corresponding area of the memory. 



1 15. A multi-word arithmetic device for executing modular 
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2 arithmetic on multi-word integers, in accordance with 

3 instructions from an external device, the multi-word 

4 arithmetic device comprising: 

5 a memory; 

6 an arithmetic unit for executing, on word units, at least 

7 two types of calculation, including addition and 

8 multiplication, and outputting a one-word calculation result; 

9 a memory input /output circuit for performing (1) a first 
-10 data transfer for storing in the memory at least one integer 

11 received from an external device, (2) a second data transfer 

12 for inputting at least one integer stored in the memory into 

13 the arithmetic unit in word units, (3) a third data transfer 

14 for storing in the memory the calculation result output from 
:35 the arithmetic unit, and (4) a fourth data transfer for 

16 outputting the calculation result from the memory to the 

17 external device; and 

18 a control circuit for, according to instructions received 

19 from the external device, 

20 (a) specifying, to the memory input/output unit, data to 

21 be transferred by the second and third data transfers, and 

22 (b) specifying, to the arithmetic unit, a type of 

23 calculation to be executed, 

24 thereby controlling: 
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25 (i) the arithmetic unit to selectively perform one of at 

26 least two types of modular arithmetic on the at least one 

27 integer stored in the memory; and 

28 (ii) the memory input /output circuit to store the. 

29 calculation result of the modular arithmetic into the memory, 

30 wherein the at least two types of modular arithmetic 

31 include modular addition and Montgomery reduction; and 

32 the control circuit controls the memory input/output 

33 circuit and the arithmetic unit so that the arithmetic unit 

34 (1) computes A+B mod P when an instruction for executing 

35 modular addition is received from the external device. A, B 

36 and P being n-word integers, and (2) computes a residue for A* 

37 R^(-l) mod P, when an instruction for executing Montgomery 

38 reduction is received from the external device, each word 

39 having k bits, A being a 2n-word integer used as input data, R 

40 being an integer 2^ (Arxn) and P being an n-word integer. 



1 16. The multi-word arithmetic unit of Claim 15, wherein 

2 the arithmetic unit includes: 

3 a multiplier receiving an input of two pieces of one-word 

4 data and outputting a piece of two-word data; 

5 an adder receiving an input of at least two pieces of two- 

6 word data, including a piece of two-word data output from the 
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7 multiplier, and outputting a piece of multi-word data; and 

8 a selecting circuit selecting, according to instructions 

9 from the control circuit: 

10 (1), data to be input into one of the multiplier and the 

11 adder out of data transmitted from the memory input/output 

12 circuit; and 

13 (2) data to be output as the calculation result out of data 

14 output from one of the adder and the multiplier. 

1 17. The multi-word arithmetic unit of Claim 16, wherein 

2 the memory is divided into two dual-port memories, each 

3 allowing access to two storage areas designated by two 

;34 addresses, and allowing (1) two read operations, or (2) one 

5 read operation and one write operation to be performed 

6 simultaneously on word units; and 

7 the at least two integers are stored in each dual-port 

8 memory so that the memory input/output circuit can 

9 simultaneously (1) read a piece of one-word data 

10 simultaneously from each of the integers stored in the two 

11 dual-port memories, and have the read pieces of data input 

12 into one of the adder and the multiplier, and (2) write a 

13 piece of one-word data output from one of the adder and the 

14 multiplier into one of the two dual-port memories. 
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ABSTRACT OF THE DISCLOSURE 

A multi-word arithmetic device, capable of executing a 
variety of types of multi-word arithmetic required for 
elliptic curve cryptology, includes the following. A memory 
5 40, formed from two dual-port memories 41 and 42, temporarily 
stores n-word integers on which calculation is performed, and 
a calculation result. An arithmetic unit 20 executes two or 
more types of calculation, including addition and 
=--| multiplication, on each word, and outputs a one-word result. 

to A memory input/output unit 30 supplies a maximum of three 

pieces of one-word data from the memory 40 to the arithmetic 
unit 20, while simultaneously storing a one-word calculation 
result from the arithmetic unit 20 in the memory 40. A 
control unit 10 controls the arithmetic unit 20 and the memory 

15 input/output unit 30 so as to have the arithmetic unit execute 
one of modular addition and Montgomery reduction on n words. 
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